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FINAL  REPORT: 

COMPUTERIZED  ADAPTIVE  PERFORMANCE  EVALUATION 


Objectives 

The  original  objectives  of  this  research  program  were  concerned  with  (1) 
the  development  of  a  psychometric  basis  for  the  construction,  development,  and 
evaluation  of  criterion-referenced  performance  tests  for  use  in  the  measure¬ 
ment  of  achievement  and  (2)  the  development  of  psychometric  methodology  for 
computerized  adaptive  performance  simulation  tests.  A  performance  simulation 
test  was  defined  as  an  interactive  problem-solving  test  in  a  particular  area 
of  achievement. 

Research  in  pursuance  of  these  objectives  began  in  February  1976  and  con¬ 
tinued  through  January  1979.  Technical  reports  were  completed  during  the  pe¬ 
riod  February  1979  through  January  1980. 

Approach 


Literature  Review 


Research  began  with  a  review  of  the  literature  on  the  problem  of  the  mea¬ 
surement  of  performance  and  achievement.  Analysis  of  the  literature  concerned 
with  the  measurement  of  achievement  led  to  a  restructuring  of  project  objectives. 

Figure  1  summarizes  the  several  approaches  to  the  measurement  of  achieve¬ 
ment  or  performance  that  were  identified  in  the  review  of  the  literature.  As 
Figure  1  shows,  the  measurement  of  achievement  was  determined  to  be  considerably 
more  complex  than  the  related  problem  of  ability  measurement.  The  most  promi¬ 
nent  trend  in  the  achievement  measurement  literature  is  the  use  of  population- 
or  norm-referenced  techniques  borrowed  from  the  field  of  ability  measurement. 

In  general,  these  techniques  have  been  based  on  classical  psychological  test 
theory,  with  the  result  that  the  obtained  measurements  and  statements  of 
achievement  or  performance  have  differed  for  a  given  individual  based  on  the 
particular  norming  group  to  which  the  individual  has  been  compared.  In  addi¬ 
tion,  the  use  of  classical  test  theory  for  achievement  measurement  makes  it 
difficult  to  apply  adaptive  testing  techniques,  because  of  the  relatively  large 
numbers  of  items  required  for  adaptive  testing  methods  based  on  classical  test 
theory  (Weiss,  1974). 

The  second  major  trend  identified  in  the  achievement  measurement  litera¬ 
ture  was  that  of  content-  or  criterion-referenced  measurement.  The  problem  of 
criterion-referenced  testing  (also  known  as  mastery  testing)  is  quite  different 
from  that  of  ability  testing.  As  a  result,  a  serious  limitation  of  the  area  of 
criterion-referenced  measurement  is  that  the  psychometric  rationale  for  it 
was  relatively  undeveloped.  In  addition,  virtually  no  methodologies  had  been 
developed  for  the  application  of  adaptive  testing  techniques  to  the  problem  of 
criterion-referenced  measurement.  Thus,  an  important  objective  of  the  project 
was  to  devise  adaptive  testing  methodologies  uniquely  applicable  to  the  prob¬ 
lem  of  criterion-  (or  content-)  referenced  measurement. 


FIGURE  1 
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a  second  important  problem  uniquely  char- 
which  was  not  characteristic  of  ability 
measurement.  This  problem  was  the  fact  that  the  measurement  of  achievement  fre¬ 
quently  occurs  as  a  result  of  an  individual’s  exposure  to  a  restricted  environ¬ 
ment,  such  as  a  class  or  a  training  course.  Typical  of  these  environments  is 
a  relatively  short  time-frame  in  which  the  change  in  an  individual’s  observed 
achievement  level  is  to  occur.  Thus,  an  important  problem  in  the  area  of  achieve 
meat  measurement  is  measuring  an  individual’s  achievement  level  over  relatively 
short  periods  of  time,  including  changes  in  that  achievement  level  as  a  function 
of  time. 


Such  an  approach  to  measurement  can  be  called  ’’time-referenced”  measure¬ 
ment,  which  evidences  several  important  problems.  Among  these  are  the  problem 
of  measuring  change  in  an  individual’s  achievement  level  from  one  point  in  time 
to  another  relatively  close  point  in  time.  Similar  to  the  area  of  criterion- 
referenced  measurement,  there  was  very  little  psychometric  rationale  available 
in  the  literature  for  the  measurement  of  individual  gain  as  required  by  a  time- 
referenced  measurement  perspective. 


A  special  case  of  time-referenced  measurement  is  that  of  ”stage-ref erenced” 
measurement.  In  stage-referenced  measurement,  a  particular  theoretical  struc¬ 
ture  describing  stages  of  achievement  is  superimposed  on  the  measurement  problem. 
Thus,  the  achievement  measurement  problem  becomes  that  of  determining  whether 
an  individual  is  progressing  in  achievement  levels  according  to  the  particular 
stage  theory  describing  levels  of  achievement  in  the  specified  achievement  do¬ 
main.  Similar  to  the  problems  of  time-referenced  and  criterion-referenced  mea¬ 
surement  of  achievement,  there  was  very  little  psychometric  rationale  available 
in  the  literature  for  the  stage-referenced  measurement  of  achievement. 
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The  review  of  the  literature  also  identified  several  other  problems  that 
are  characteristic  of  the  measurement  of  achievement,  as  compared  to  the  mea¬ 
surement  of  ability.  One  of  these  is  that  the  goals  of  achievement  measure¬ 
ment  are  frequently  embodied  in  the  specification  of  particular  achievement 
domains.  Frequently,  these  achievement  domains  are  relatively  specific;  and 
in  the  process  of  constructing  achievement  tests  to  measure  these  domains, 
only  a  limited  number  of  test  items  can  be  generated  due  to  the  specificity 
of  the  domains.  Thus,  the  measurement  of  achievement  frequently  requires  a 
multidimensional  approach  measuring  specific  content  domains  using  relatively 
small  numbers  of  test  items  in  comparison  to  those  used  for  the  measurement  of 
ability.  As  a  result,  traditional  adaptive  testing  models  developed  in  the 
ability  testing  area  may  not  be  directly  applicable  to  the  measurement  of 
achievement.  The  literature  thus  suggested  that  it  might  be  necessary  to  de¬ 
velop  adaptive  testing  strategies  for  the  measurement  of  achievement  that  were 
specifically  designed  to  operate  efficiently  with  a  large  number  of  small  content 
domains. 

Finally,  the  review  of  the  literature  and  some  subsequent  analysis  of  in¬ 
structional  environments  indicated  that  the  measurement  of  performance  by  com¬ 
puterized  adaptive  simulation  techniques  was  considerably  more  complex  than 
had  originally  been  anticipated.  Additionally,  the  review  indicated  that  there 
was  virtually  no  psychometric  rationale  available  in  the  literature  for  the 
measurement  of  performance  by  simulation.  Although  there  were  some  applica¬ 
tions  of  performance  simulation  to  the  measurement  of  achievement,  analysis  of 
the  methodologies  and  attempts  to  apply  those  methodologies  in  relevant  instruc¬ 
tional  environments  indicated  that  the  measurement  of  achievement  by  performance 
simulation  was  seriously  situation-bound.  That  is,  it  was  extremely  unlikely 
that  any  generalizable  methodologies  could  be  developed  that  would  be  trans¬ 
ferable  across  instructional  situations  of  different  types.  Consequently,  after 
some  preliminary  trial  work  with  performance  simulations,  the  objective  of  de¬ 
veloping  a  psychometric  rationale  for  the  measurement  of  achievement  by  perfor¬ 
mance  simulation  was  abandoned  until  more  generalizable  methodologies  could  be 
identified. 

Revised  Objectives 

The  review  of  the  literature  thus  led  to  a  redefinition  of  project  goals. 

The  revised  project  objectives  were  oriented  around  the  development  of  adaptive 
testing  strategies  designed  to  address  the  unique  problems  of  the  measurement 
of  achievement.  The  approach  used  was  to  first  examine  the  applicability  of 
adaptive  testing  strategies  developed  in  the  ability  testing  domain  to  rele¬ 
vant  problems  in  the  achievement  testing  domain.  Then,  further  efforts  were 
oriented  toward  the  development  of  adaptive  testing  techniques  specifically 
designed  for  the  unique  demands  of  achievement  testing,  and  an  investigation 
of  some  of  the  unique  problems  of  achievement  testing  and  analysis  of  some  of 
the  psychological  aspects  of  the  achievement  testing  environment. 

Results 


Applications  of  Item  Characteristic  Curve  Models  and  Adaptive  Testing  Strategies 

ICC  models.  The  first  technical  report  from  the  project  (Research  Report 
77-5)  investigated  the  question  of  whether  item  characteristic  curve  (ICC)  the¬ 
ory  methods  utilized  in  ability  testing  were  applicable  to  data  derived  from 


the  measurement  of  achievement.  This  report  described  the  ICC  calibration  of 
an  achievement  testing  item  pool.  Data  used  were  derived  from  a  general  bi¬ 
ology  course  at  the  University  of  Minnesota.  The  item  pool  was  a  multiple- 
choice  set  of  items  written  by  course  instructors.  In  addition  to  analyzing 
the  applicability  of  ICC  item  calibration  techniques  to  this  item  pool,  the 
dimensionality  of  the  pool  was  examined  in  order  to  determine  whether  uni¬ 
dimensional  ICC  theory  was  applicable  to  the  measurement  of  this  domain  of 
achievement.  Results  showed  that  the  pool  was  generally  unidimensional  and 
that  it  was  possible  to  derive  appropriate  ICC  parameters  from  this  pool. 

Adaptive  testing  strategies.  Using  this  item  pool,  the  next  question  in 
vestigated  was  whether  adaptive  testing  techniques  developed  for  the  ability 
testing  domain  were  applicable  to  the  measurement  of  biology  achievement  (Re¬ 
search  Report  77-7).  A  strat if ied-adaptive  (stradaptive)  test  was  administered 
to  a  group  of  students  and  compared  with  a  conventional  classroom  test  derived 
from  the  same  item  pool*  as  well  as  with  an  improved  conventional  test  devel¬ 
oped  from  the  pool.  Tests  were  compared  in  terms  of  information  (precision  of 
measurement).  Results  showed  that,  as  expected,  the  adaptive  test  provided 
measurement  of  greater  precision  than  did  the  conventional  tests.  The  results 
also  indicated  that  the  adaptive  test  provided  measurement  of  equal  precision 
with  considerably  fewer  numbers  of  items  than  did  the  conventional  tests. 

When  the  average  number  of  items  administered  in  the  adaptive  test  was  equal 
to  that  of  the  conventional  tests,  adaptive  test  scores  were  more  precise  than 
either  the  classroom  conventional  test  or  the  improved  conventional  test. 

Although  the  demonstration  of  improved  precision  of  measurement  from 
adaptive  testing  in  comparison  to  conventional  testing  is  supportive  of  the 
general  value  of  adaptive  testing  for  measuring  achievement,  the  question  of 
the  relative  validity  of  the  two  techniques  was  also  important.  In  Research 
Report  78-4  the  comparative  validity  of  adaptive  and  conventional  achievement 
tests  was  studied.  Since  it  is  very  difficult  in  the  achievement  domain  to  ob¬ 
tain  a  criterion  against  which  the  relative  validity  of  two  testing  techniques 
can  be  evaluated,  the  problem  was  approached  by  comparing  the  respective  con¬ 
struct  validity  of  the  two  testing  techniques.  The  results  of  this  study  showed 
that  the  construct  validity  of  the  adaptive  tests  was  effectively  higher  than 
that  of  the  conventional  tests,  since  equal  validities  were  achieved  for  the 
two  testing  strategies,  but  the  adaptive  tests  required  25%  to  35%  fewer 
items  than  did  the  conventional  tests. 

Thus,  these  studies  demonstrated  the  applicability  of  ICC  techniques  pre¬ 
viously  applied  almost  exclusively  in  the  area  of  ability  testing,  as  well  as 
adaptive  testing  strategies  developed  for  ability  testing,  to  the  problem  of 
achievement  testing.  Results  indicated  both  higher  precision  of  measurement 
and  higher  effective  levels  of  validity  for  the  adaptive  test. 

ICC  scoring  methods.  The  process  of  examining  the  problem  of  the  appli¬ 
cability  of  ICC  theory  and  adaptive  testing  techniques  to  the  measurement  of 
achievement  led  to  the  development  of  a  set  of  computer  programs  for  scoring 
achievement  test  data  with  ICC  models.  Since  these  programs  were  written  as 
general  purpose  programs,  they  were  made  available  in  Research  Report  79-1  for 
other  researchers  who  desired  to  use  ICC  methodologies  in  scoring  achievement 
or  ability  tests. 

In  the  process  of  implementing  the  reliability  and  validity  studies  compar¬ 
ing  adaptive  and  conventional  testing  strategies,  decisions  had  to  be  made  about 


the  appropriate  ways  of  scoring  the  achievement  test  data  using  ICC  models. 
These  decisions  were  necessary  for  both  the  conventional  tests  and  the  adaptive 
tests.  Thus,  a  relevant  question  concerned  the  relationships  among  achievement 
level  estimates  using  the  one-,  two-,  and  three-parameter  ICC  models,  as  well 
as  the  maximum  likelihood  normal,  maximum  likelihood  logistic,  and  Bayesian 
methods  for  scoring  ability  test  data  with  ICC  models. 

To  compare  these  scoring  methods  and  models  with  each  other,  live  data 
from  an  achievement  test  were  scored  by  all  combinations  of  ICC  models  and 
methods.  The  results  (Research  Report  79-3)  indicated  that  highly  similar 
achievement  level  estimates  were  derived  from  the  one-  and  two-parameter  data 
but  that  when  the  third  (guessing)  parameter  was  added  to  the  scoring  proce¬ 
dures,  the  similarities  among  achievement  level  estimates  decreased.  The  data 
also  indicated  that  the  three-parameter  model  resulted  in  less  similar  achieve¬ 
ment  level  estimates  for  adaptive  test  data  than  for  conventional  test  data. 
However,  at  the  same  time,  there  were  fewer  convergence  failures  for  maximum 
likelihood  scoring  in  adaptive  test  data  than  there  were  in  conventional  test 
data. 


Unique  Problems  of  Achievement  Testing 


i 


In  addition  to  studying  the  applicability  of  ICC  models  and  adaptive  test 
procedures  derived  from  ability  testing  to  the  problems  of  achievement  testing, 
the  project  was  concerned  with  the  development  of  solutions  to  some  of  the 
unique  problems  raised  in  achievement  testing, as  well  as  the  analysis  of  the 
implications  of  some  other  unique  characteristics  of  achievement  testing. 


Multiple  content  areas.  As  indicated  previously,  one  problem  character¬ 
istic  of  achievement  testing,  in  contrast  to  ability  testing,  is  the  necessity 
to  measure  an  individual’s  achievement  levels  in  a  number  of  content  areas  at 
the  same  time.  In  addition,  in  many  cases  the  number  of  items  available  in 
a  content  area  is  very  restricted,  resulting  in  relatively  short  tests  that 
would  not  permit  the  application  of  many  standard  adaptive  testing  strategies. 


Consequently,  an  adaptive  testing  strategy  designed  specifically  for 
achievement  test  batteries  was  developed  (Research  Report  77-6) .  This  strategy 
is  one  that  is  applicable  to  achievement  tests  composed  of  any  number  of  short 
subtests.  The  strategy  is  designed  to  utilize  both  intra-subtest  adaptive  item 
selection  as  well  as  Inter-subtest  adaptive  branching  in  order  to  reduce  test  bat¬ 
tery  length  to  a  minimum  for  each  individual.  The  testing  strategy  utilizes  a 
maximum  information  ICC-based  item  selection  technique  combined  with  Bayesian 
scoring  to  adaptively  select  items  within  a  subtest  until  there  are  no  items 
left  that  provide  more  than  trivial  amounts  of  information  about  an  individual’s 
achievement  level.  Having  obtained  an  achievement  level  estimate  from  one  sub¬ 
test,  that  estimate  is  then  used  in  a  bivariate  regression  equation  to  obtain 
a  prior  achievement  level  estimate  in  the  next  subtest  in  the  test  battery. 

The  adaptive  testing  strategy  then  adaptively  selects  items  in  the  next  subtest, 
using  the  prior  ability  estimate,  until  no  further  items  are  available  for  ad¬ 
ministration  in  that  subtest.  At  the  end  of  the  second  subtest,  multiple  re¬ 
gression  is  used  to  obtain  a  prior  achievement  level  estimate  to  begin  testing 
in  the  third  subtest,  and  the  process  is  repeated  until  all  subtests  have  been 
administered . 
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Results  of  applying  this  adaptive  testing  strategy  to  an  achievement  test 
battery  in  a  military  testing  environment,  using  real-data  simulation  tech¬ 
niques,  indicated  an  average  50%  reduction  in  test  length  for  the  individuals 

tested,  with  no  loss  in  the  quality  of  the  obtained  measurements.  Test  length 

reductions  varied  from  18%  to  80%  across  individuals.  Thus,  considerable  re¬ 
ductions  in  number  of  test  items  administered  were  achieved  while  maintaining 
the  quality  of  the  measurements  obtained  from  the  conventional  test. 

Mastery  testing.  Since  the  methodologies  required  to  adaptively  measure 
mastery  within  a  criterion-referenced  framework  are  not  the  same  as  those  avail¬ 
able  for  the  measurement  of  ability  levels,  an  adaptive  testing  strategy  for 
making  mastery  decisions  was  developed  (Research  Report  79-5).  This  testing 
strategy  utilized  ICC  theory  and  methodologies  in  conjunction  with  a  maximum 
information  adaptive  testing  technique  and  Bayesian  scoring.  The  testing  strat¬ 
egy  was  designed  to  use  a  prespecified  and  flexible  mastery  level  for  comparison 

with  each  individual’s  performance. 

The  adaptive  mastery  testing  strategy  was  compared  with  a  conventional 
mastery  test  in  a  military  training  environment,  using  real-data  simulation. 

When  the  results  for  the  two  testing  strategies  were  compared,  the  adaptive 
mastery  testing  strategy  reduced  the  average  test  length  from  30%  to  81%  over 
all  mastery  decisions  examined,  with  modal  test  length  reductions  up  to  92%, 
yet  it  reached  the  same  decision  as  the  conventional  test  for  96%  of  trainees. 
Thus,  again,  considerable  savings  in  the  number  of  test  items  administered  were 
observed  for  the  adaptive  test,  while  it  made  decisions  which  were  highly  similar 
to  those  made  by  the  conventional  test. 

Dimensionality  of  achievement  over  time.  As  indicated  above,  a  unique 
problem  in  the  area  of  the  measurement  of  achievement  is  that  of  measuring  a 
person’s  change  in  achievement  level  over  a  relatively  short  period  of  time. 

If  ICC  theory  is  to  be  used  in  the  measurement  of  achievement,  it  will  gain 
its  highest  degree  of  potential  usefulness  if  it  can  be  used  to  measure  the 
growth  in  one  individual’s  achievement  level  from  the  beginning  of  instruction 
to  later  points  in  instruction.  However,  the  implementation  of  this  paradigm 
for  the  measurement  of  individual  growth  requires  the  demonstration  that  an 
achievement  test  given  at  two  or  more  points  in  time  measures  the  same  achieve¬ 
ment  dimension  and  that  the  dimension  measured  is  a  undimensional  variable. 
Research  Report  79-4  reported  results  addressed  to  this  question. 

Dimensionality  was  investigated  within  the  pretest-test  paradigm  for  mea¬ 
suring  change  in  achievement  levels  and  within  the  test-posttest  paradigm  for 
measuring  retention.  Data  indicated  that  there  were  some  questions  about  the 
utility  of  the  pretest-test  paradigm,  since  a  comparison  of  the  ICC  parameter 
estimates  obtained  from  achievement  test  items  at  two  points  in  time  4  weeks 
apart  suggested  a  change  in  the  dimensionality  of  achievement  over  that  period 
of  instruction.  These  results  were  also  supported  by  the  results  of  factor 
analyses.  The  data  did,  however,  support  the  test-posttest  paradigm  to  measure 
retention,  since  a  regression  comparison  of  students*  achievement  level  estimates 
did  not  indicate  any  differences  in  the  achievement  metric  up  to  1  month  after 
the  completion  of  instruction.  However,  additional  research  is  necessary  in 
order  to  further  verify  and  examine  these  conclusions. 

Effects  of  knowledge  of  results.  The  advent  of  computerized  adaptive 
testing  also  brings  with  it  the  potential  of  administering  to  students  during 


% 

i 

v  : 


4 


-7- 


the  process  of  testing  immediate  feedback  as  to  the  correctness  or  incorrect¬ 
ness  of  their  test  responses.  Previous  research  in  the  ability  testing  domain 
(Betz  &  Weiss,  1976a,  1976b;  Prestwood  &  Weiss,  1978)  suggests  that  the  admin¬ 
istration  of  immediate  knowledge  of  results  for  each  test  item  during  the 
process  of  testing  reduces  the  effects  of  extraneous  variables  on  ability  test 
scores.  However,  if  immediate  feedback  is  to  be  administered  to  students  in 
an  achievement  testing  environment,  it  is  possible  that  the  information  gain¬ 
ed  from  feedback  on  prior  items  may  affect  a  student’s  performance  on  subse¬ 
quent  items  in  the  test.  A  basic  assumption  of  ICC  theory  is  that  of  local 
independence,  that  is,  that  the  response  of  a  student  to  a  given  test  item  is 
the  result  only  of  the  underlying  achievement  variable, and  not  of  other 
variables.  If  knowledge  of  results  from  prior  items  in  an  achievement  test 
affected  a  student's  performance  on  subsequent  items,  the  assumption  of  local 
independence  would  be  violated. 

Research  Report  80-1  was  concerned  with  this  issue.  In  two  studies, 
data  derived  from  two  groups  of  students  (one  of  which  received  immediate 
knowledge  of  results  while  the  other  received  no  knowledge  of  results)  on 
computer-administered  tests  were  compared  with  each  other.  The  results  indi¬ 
cated  essentially  no  systematic  differences  in  achievement  level  estimates  or 
in  the  dimensionality  of  the  students'  responses  as  a  result  of  the  adminis¬ 
tration  of  immediate  knowledge  of  results.  Thus,  the  data  indicated  that 
this  added  benefit  of  computerized  administration  of  achievement  tests  did 
not  affect  the  assumptions  under  which  ICC  theory  could  be  applied  in  the 
achievement  testing  environment. 


Major  Findings 

Summarized  below  are  the  major  findings  from  this  research  program,  with 
references  to  the  research  reports  in  which  these  findings  are  reported.  In 
addition  to  these  major  findings,  the  original  research  reports  should  be  con¬ 
sulted  for  additional  important  results  and  conclusions. 

1.  The  successful  application  of  ICC  theory  to  achievement 
testing  requires  that  the  item  pool  be  reasonably  uni¬ 
dimensional.  Analyses  of  a  large  item  pool,  constructed 
by  the  instructional  staff  of  a  university  level  course, 
indicated  that  the  pool  was  essentially  unidimensional 
(Research  Reports  77-5  and  80-1). 

2.  When  ICC  item  parameters  were  estimated  from  this  item  pool, 
the  majority  of  the  items  resulted  in  parameter  estimates 
that  were  suitable  for  operational  testing  purposes  (Research 
Report  77-5). 

3.  The  ICC  parameter  estimates  obtained  from  this  item  pool  re¬ 
flected  sufficiently  high  levels  of  discrimination  and  a 
sufficient  range  of  difficulty  to  be  useful  in  adaptive 
testing  (Research  Reports  77-5,  77-7,  and  78-4). 

4.  Using  operational  achievement  tests  from  military  instruc¬ 
tional  environments,  it  was  possible  to  obtain  usable  ICC 
item  parameter  estimates  even  in  narrowly  defined  content 
domains  (Research  Reports  77-6  and  79-5). 
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5.  The  item  parameter  data  indicate  that  some  caution  might  be 
necessary,  however,  when  estimating  ICC  item  parameters  in 
achievement  test  data.  Relatively  high  discrimination  para¬ 
meter  estimates  in  conjunction  with  high  guessing  parameter 
estimates  (Research  Reports  77-5,  77-6,  and  79-5)  may  re¬ 
flect  a  restriction  in  range  on  the  achievement  variable. 

If  the  effect  of  instruction  is  to  eliminate  individual 
differences  in  measured  achievement,  ICC  parameter  esti¬ 
mates  of  discrimination  and  guessing  obtained  on  groups  at 
their  peak  of  instruction  will  be  artificially  inflated. 
Additional  research  on  this  problem  is  necessary. 

6.  ICC  theory  and  methods,  combined  with  specially  designed 
adaptive  testing  strategies,  can  be  useful  in  substantially 
reducing  the  number  of  items  administered  to  trainees  in 

an  achievement  test  battery  composed  of  a  number  of  specific 
content  domains  (Research  Report  77-6). 

7.  Both  adaptive  testing  techniques  and  ICC  theory  and  methods 
are  useful  in  reducing  test  lengths  for  tests  used  to  make 
mastery  decisions  (Research  Report  79-5). 

8.  In  a  variety  of  applications  to  the  problem  of  achievement 
testing — including  measuring  achievement  with  a  large  uni¬ 
dimensional  item  pool,  measuring  achievement  levels  in  a 
number  of  specific  content  domains,  and  measuring  achieve¬ 
ment  against  a  defined  mastery  criterion — adaptive  testing 
techniques  using  ICC  theory  can  substantially  reduce  the 
numbers  of  items  required  in  an  achievement  test  without 
reducing  the  quality  of  the  measurements  (Research  Reports 
77-6  and  79-5). 

9.  Adaptive  testing  can  improve  the  quality  of  achievement 
measurements  in  terms  of  both  precision  and  validity  while 
reducing  the  numbers  of  items  required  (Research  Reports 
77-7  and  78-4). 

10.  ICC  test  scoring  methods  (Research  Report  79-1)  can  be 
fruitfully  applied  to  achievement  testing  data  (Research 
Report  79-3).  However,  maximum  likelihood  ICC  scoring  is 
less  useful  in  conventional  tests  because  of  its  non¬ 
convergence  problem  when  the  test  is  too  easy  or  too 
difficult  for  a  testee.  Although  non-convergences  occur 
much  less  frequently  in  adaptive  test  data,  use  of  the 
three-parameter  ICC  model  with  different  scoring  methods 
tends  to  result  in  somewhat  different  achievement  level 
estimates.  More  research  on  this  problem  is  indicated. 

11.  Because  of  its  ability  to  equate  testings  and  link  item 
pools  onto  a  common  metric,  ICC  theory  has  the  potential 
of  offering  solutions  to  the  problem  of  measuring  gains 
in  achievement  levels  during  the  process  of  instruction. 
However,  examination  of  the  dimensionality  of  an  achieve¬ 
ment  test  item  pool  from  pre-instruction  to  the  peak  of 
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instruction  shows  changes  in  the  dimensionality  of  achieve¬ 
ment  during  instruction  (Research  Report  79-4),  These  results, 
if  verified  with  other  data,  suggest  potential  problems  in 
the  applicability  of  unidimensional  ICC  theory  to  the  measure¬ 
ment  of  individual  growth  in  achievement  levels  due  to  instruction. 

12.  The  use  of  ICC  methods  to  measure  retention  following  instruc¬ 
tion  was  supported  by  the  data  (Research  Report  79-4).  These 
results  show  that  the  same  achievement  variable  was  measured 
up  to  1  month  after  instruction  as  was  measured  at  the  peak  of 
instruction. 

13.  The  post-instruction  data  (Research  Report  79-4)  also  support 
the  use  of  computerized  adaptive  testing  in  operational  in¬ 
structional  environments.  Since  these  data  indicate  that 
the  same  achievement  variable  is  measurable  up  to  a  month 
after  the  end  of  instruction,  instructional  environments  with 
a  limited  number  of  testing  terminals  can  obtain  similar 
measurements  from  trainees  when  tests  are  administered  on 
different  days. 

14.  The  use  of  unidimensional  ICC  theory  in  achievement  testing 
is  further  supported  by  the  lack  of  effect  on  dimensionality 
of  the  administration  of  immediate  knowledge  of  results 
during  the  process  of  achievement  testing  (Research  Report 
80-1). 


Implications  for  Further  Research 

The  findings  and  experience  of  this  3-year  research  program  strongly  sup¬ 
port  the  use  of  ICC  theory  and  methods  and  computerized  adaptive  testing  for 
the  measurement  of  achievement.  However,  many  new  questions  were  rax  ed  by 
the  research  (some  of  which  were  described  above)  and  some  of  the  original 
questions  addressed  are  still  in  need  of  further  research.  Portions  of  the 
research  described  below  are  being  pursued  under  a  contract  entitled  "Com¬ 
puterized  Adaptive  Achievement  Testing,"  NR150-433,  with  the  Personnel  and 
Training  Research  Programs  of  the  Office  of  Naval  Research,  with  funds  from 
the  Defense  Advanced  Research  Projects  Agency,  Army  Research  Institute,  Air 
Force  Office  of  Scientific  Research,  and  the  Office  of  Naval  Research. 

Inter-Subtest  Branching 

Although  Research  Report  77-6  demonstrated  that  an  adaptive  testing 
strategy  using  inura-subtest  adaptive  item  selection  in  conjunction  with 
inter-subtest  adaptive  branching  could  substantially  reduce  test  battery 
length  in  one  achievement  test  battery,  the  generality  of  this  finding  needs 
to  be  examined.  In  addition,  the  relative  efficiency  of  alternative  approaches 
to  inter-subtest  branching  needs  to  be  studied. 

The  scoring  strategy  used  in  Research  Report  77-6  was  based  on  the  maxi¬ 
mum  information  item  selection  strategy  using  Bayesian  scoring.  However,  the 
use  of  Bayesian  scoring,  which  has  a  tendency  to  regress  achievement  estimates 
toward  the  mean,  may  result  in  the  premature  termination  of  the  intra-subtest 
item  selection,  particularly  when  used  in  conjunction  with  the  minimum 
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information  termination  criterion.  Thus,  a  relevant  area  of  research  is  that 
of  the  evaluation  of  intra-subtest  item  selection  strategies  that  may  elimin¬ 
ate  this  problem  and  identification  of  situations  under  which  use  of  Bayesian 
scoring  in  conjunction  with  maximum  information  item  selection  is  less  than 
optimal . 

A  second  problem  in  intra-subtest  adaptive  item  selection  for  inter¬ 
subtest  branching  strategies  is  that  of  the  termination  criterion.  Research 
to  date  has  utilized  a  termination  criterion  based  on  minimum  information  at 
the  current  estimated  level  of  achievement.  However,  if  Bayesian  scoring  is 
to  be  used,  it  is  possible  to  terminate  on  the  basis  of  a  minimum  posterior 
Bayesian  variance  of  the  achievement  level  estimate.  The  relative  performance 
of  these  two  termination  criteria  as  well  as  their  interactions  with  the  intra¬ 
subtest  item  selection  strategy,  needs  to  be  investigated. 

With  regard  to  branching  between  content  areas,  previous  research  has 
identified  one  means  of  ordering  subtests  for  inter-subtest  branching  and  has 
relied  exclusively  on  linear  multiple  regression  as  the  inter-subtest  achieve¬ 
ment  level  estimation  technique.  Other  prediction  strategies  are  available 
for  making  predictions  between  content  areas  and  there  are  other  ways  of  or¬ 
dering  subtests  to  be  used  in  inter-subtest  predictions.  In  addition,  the  use 
of  linear  multiple  regression  equations  brings  up  the  question  of  shrinkage 
with  regard  to  the  application  of  regression  equations  based  on  one  sample  of 
individuals  when  utilized  on  another  sample  from  the  same  population.  The 
effect  of  overestimation  and  shrinkage  needs  to  be  investigated  within  this 
inter-subtest  branching  strategy. 

Finally,  previous  research  has  indicated  that  there  is  wide  variability 
in  the  range  of  reduction  in  number  of  items  administered  across  subtests. 

Thus,  a  relevant  question  is  the  nature  of  the  subtests  resulting  in 
larger  or  smaller  reductions  du,e  to  the  use  of  the  inter-subtest  branching 
strategy.  This  latter  question  is  most  efficiently  investigated  by  monte  carlo 
simulation  studies  in  which  characteristics  of  the  subtests  are  systematically 
varied . 

Dimensional ity  of  Achievement  Over  Time 

As  indicated  above,  ICC  theory  has  the  potential  of  permitting  the  mea¬ 
surement  of  individual  growth  in  achievement  over  time  in  instruction.  But  the 
initial  results  in  Research  Report  79-4  suggest  that  the  achievement  dimension 
changes  from  pretest  to  end-of-course-unit  testing.  Thus,  further  examination 
of  this  problem  is  indicated. 

The  investigation  of  the  dimensionality  of  achievement  over  time  is  being 
studied  in  a  number  of  achievement  domains,  including  domains  that  are  primar¬ 
ily  cognitive  as  well  as  those  that  are  primarily  conceptual.  Obtained  data  on 
achievement  measured  at  various  points  in  time  will  be  factor  analyzed.  In 
each  case,  items  will  be  parameterized  by  ICC  models  and  the  change  of  these 
parameters  over  time  will  be  studied.  In  addition,  achievement  level  estimates 
based  on  factors  identified  at  relevant  points  in  time  will  be  obtained  and  the 
relationship  among  these  achievement  level  estimates  over  time  will  be  studied. 
The  relative  saliency  of  factors  identified  at  different  points  in  time  will 
also  be  analyzed  to  determine  whether  the  same  factors  are  evident  at  differ¬ 
ent  points  in  time  but  at  different  levels  of  saliency.  If  the  latter 
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hypothesis  is  supported  by  the  data,  it  may  then  be  possible  to  Investigate 
inter-time  branching,  taking  into  account  the  relevant  saliency  of  those  dimen¬ 
sions  at  different  points  in  time. 

Depending  on  the  results  of  the  analyses  of  achievement  level  data  at 
different  points  in  time  over  a  number  of  instructional  contexts,  adaptive 
testing  strategies  for  inter-time  branching  will  be  developed  and  evaluated. 

If  the  same  dimension  is  found  to  exist  with  different  saliencies  at  differ¬ 
ent  points  in  time,  the  utility  of  the  information  provided  at  the  prior  point 
in  time  with  respect  to  adaptive  testing  at  later  points  in  time  will  be 
studied  by  live  testing  and  by  real-data  simulation.  One  obvious  approach 
would  be  to  simply  use  the  correlation  of  achievement  level  estimates  on  a 
normative  group  from  earlier  points  in  time  with  later  points  in  time  as 
entry  points  into  later  time  achievement  level  estimation.  When  data  are 
available  at  more  than  one  prior  point  in  time,  the  use  of  multivariate  pre¬ 
diction  strategies  becomes  relevant,  and  the  relative  advantages  of  different 
strategies  will  need  to  be  investigated. 

Adaptive  Mastery  Testing 

An  adaptive  testing  strategy  for  making  mastery  decisions  was  developed  in 
Research  Report  79-5.  Although  the  data  in  that  report  indicate  some  promise 
for  this  ICC-based  mastery  testing  approach,  considerable  additional  study  of 
its  potential  as  a  solution  to  the  mastery  testing  problem  is  appropriate. 

First,  the  adaptive  mastery  testing  (AMT)  strategy  needs  to  be  studied 
in  additional  mastery  tests.  In  addition,  its  operating  characteristics  need 
to  be  examined  in  comparison  with  competitive  strategies  for  mastery  testing, 
including  strategies  based  on  Waldian  decision  theory. 

The  strategy  also  needs  to  be  examined  in  a  wide  variety  of  classification 
situations.  In  one  application  of  the  AMT  strategy,  error  may  be  associated 
primarily  with  the  criterion,  as  would  be  the  case  where  the  items  in  a  mastery 
test  are  all  of  similar  difficulty  and  discrimination;  hence,  the  maximum  in¬ 
formation  in  the  item  pool  is  concentrated  around  the  criterion  cutoff  value. 

In  a  more  realistic  situation,  errors  are  associated  with  both  the  criterion 
and  the  individual  being  measured.  These  different  approaches  to  adaptive 
mastery  testing  should  be  compared  in  both  real-data  simulation  studies  and 
monte  carlo  simulation  studies.  The  real-data  simulation  studies  will  use  ex¬ 
isting  data  administered  in  a  conventional  test  format,  from  mastery  tests 
utilized  in  military  and  educational  environments,  to  determine  the  operating 
characteristics  of  these  two  major  approaches  to  AMT  as  well  as  to  evaluate  the 
outcomes  when  both  the  criterion  and  the  individual  are  measured  with  error.  If 
differential  results  are  obtained  using  these  strategies  in  real-data  simula¬ 
tion,  it  will  then  be  appropriate  to  design  monte  carlo  simulation  studies  to 
model  the  relevant  parameters  of  the  situation  (e.g.,  levels  of  item  difficulty, 
discrimination,  and  numbers  of  items,  as  well  as  various  degrees  of  error  on 
the  criterion)  and  to  compare  these  results  with  results  obtained  by  competing 
strategies. 

A  final  area  of  research  with  regard  to  AMT  is  the  generalization  of  the 
methodologies  to  the  multi-subtest  mastery  testing  problem.  Similar  to  the 
multi-subtest  achievement  testing  problem,  decisions  made  with  regard  to  one 
subtest  may  be  related  to  decisions  made  with  regard  to  another  subtest.  Thus, 
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research  Is  Indicated  for  determining  how  much  information  derived  from  one 
subtest  in  a  multl-subtest  mastery  test  can  be  used  In  adaptive  testing  in 
other  subtests.  For  example,  achievement  level  estimates  generated  from  sub¬ 
tests  in  one  content  area  can  be  used  to  begin  adaptive  testing  in  another 
content  area.  In  some  cases,  only  very  few  items  will  be  necessary  to  make 
the  mastery  decisions  in  later  subtests  because  of  the  intercorrelations  among 
the  mastery  decisions  and/or  achievement  level  estimates  derived  from  other 
content  areas. 

Adaptive  Self-Referenced  Testing 

The  review  of  the  achievement  measurement  literature  indicated  the  lack 
of  a  coherent  framework  for  the  measurement  of  achievement.  Approaches  to  the 
measurement  of  achievement  such  as  norm-  or  population-referenced  testing  and 
criterion-referenced  (mastery)  testing  appeared  to  have  nothing  in  common  with 
each  other  and  little  or  no  implication  for  what  appears  to  be  the  important 
problem  in  the  measurement  of  achievement — that  of  measuring  individual  improve¬ 
ment  in  achievement  levels  during  the  process  of,  or  as  a  result  of,  instruc¬ 
tion.  The  activities  of  the  present  research  program  have  led  to  the  notion 
of  Adaptive  Self-Referenced  Testing  (ASRT) ,  which  appears  to  represent  a  co¬ 
herent  framework  for  the  measurement  of  achievement.  ASRT  can  incorporate 
into  a  single  framework  the  notions  of  inter-subtest  branching,  inter-time 
branching,  and  mastery  testing. 

ASRT  is  only  possible  by  combining  computerized  adaptive  testing  and  ICC 
theory.  It  involves  the  measurement  of  growth  on  an  individual  basis,  incor¬ 
porating  knowledge  of  the  student’s  level  of  performance  at  an  earlier  point  in 
time,  which  is  used  as  a  starting  point  for  measurement  at  a  later  point  in 
time.  ASRT  is  designed  to  track  an  individual’s  growth  in  one  area  of  achieve¬ 
ment  as  a  function  of  time.  It  thus  can  be  used  to  identify  the  degree  and 
extent  of  learning  as  it  occurs  and  the  point  at  which  learning  occurs  or  fails 
to  occur  during  the  process  of  instruction.  The  generalizat ion  of  unidimen¬ 
sional  self-referenced  testing  to  the  multidimensional  case  (i.e.,  where  more 
than  one  content  area  is  being  measured)  incorporates  the  inter-test  branching 
problem.  The  objective  is  to  utilize,  on  an  individual  basis,  information 
gained  both  on  other  tests  and  at  prior  time  periods  for  the  measurement  of 
growth  in  learning  (achievement) . 

ASRT  is  unique  in  that  the  sequence  of  measurements  taken  to  measure  each 
individual’s  learning  history  is  based  only  on  that  individual’s  prior  perfor¬ 
mance  at  earlier  points  in  time  in  the  same  content  domain.  It  is  also  designed 
to  operate  uniquely  within  both  computer-assisted  and  computer-managed  instruc¬ 
tion.  If  properly  implemented,  it  should  be  an  extremely  powerful  approach  for 
measuring  achievement  in  these  contexts,  permitting  a  continuous  evaluation  of 
student  progress  and  a  non-normat ive  definition  of  ’’when  learning  has  occurred 
and  how  much  has  been  learned,”  while  reducing  testing  time  to  a  minimum  for 
each  student. 
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Abstracts  of  Research  Reports 


Research  Report  77-5 

Calibration  of  an  Item  Pool  for  the  Adaptive  Measurement  of  Achievement 

Isaac  I.  Bejar,  David  J.  Weiss,  and  G.  Gage  Kingsbury 

September  1977 

The  applicability  of  item  characteristic  curve  (ICC)  theory  to  a  multiple- 
choice  test  item  pool  used  to  measure  achievement  is  described.  The 
rationale  for  attempting  to  use  ICC  theory  in  an  achievement  framework  is 
summarized,  and  the  adequacy  for  adaptive  testing  of  a  conventional  class¬ 
room  achievement  test  item  pool  in  a  college  biology  class  is  studied.  Using 
criteria  usually  applied  to  ability  measurement  item  pools,  the  item  diffi¬ 
culties  and  discriminations  in  this  achievement  test  pool  were  found  to  be 
similar  to  those  used  in  adaptive  testing  pools  for  ability  testing.  Studies 
of  the  dimensionality  of  the  pool  indicate  that  it  is  primarily  unidimen¬ 
sional.  Analysis  of  the  item  parameters  of  items  administered  to  two 
different  samples  reveals  the  possibility  of  a  deviation  from  invariance  in 
the  discrimination  parameter,  but  a  high  degree  of  invariance  for  the  diffi¬ 
culty  parameter.  The  pool  as  a  whole,  as  well  as  two  subpools,  is  judged  to 
be  adequate  for  use  in  adaptive  testing.  It  is  also  concluded  that  the  ICC 
model  is  not  inappropriate  for  application  to  typical  college  classroom 
achievement  tests  similar  to  the  one  studied. 


Research  Report  77-6 

An  Adaptive  Testing  Strategy  for  Achievement  Test  Batteries 

Joel  M.  Brown  and  David  J.  Weiss 
October  1977 

An  adaptive  testing  strategy  is  described  for  use  with  achievement  tests 
that  cover  multiple  content  areas.  The  testing  strategy  combines  adaptive 
item  selection  both  within  and  between  the  subtests  in  the  multiple-subtest 
battery.  A  real-data  simulation  was  conducted  in  order  to  compare  the 
results  from  computerized  adaptive  testing  with  those  from  conventional 
paper-and-pencil  testing,  in  terms  of  test  information  and  test  length. 

Data  for  the  simulation  consisted  of  test  results  for  365  fire-control  tech¬ 
nicians  on  a  paper-and-pencil  administration  of  a  232-item  achievement  test, 
which  was  divided  into  12  subtests,  each  covering  a  different  content  area. 
Correlations  between  subtest  scores  from  adaptive  and  conventional  testing 
were  .90  or  higher  for  11  of  the  12  content  areas.  An  information  analysis 
showed  that  for  all  12  subtests,  the  subtest  information  curves  from  adap¬ 
tive  testing  were  essentially  identical  to  the  corresponding  subtest  infor¬ 
mation  curves  from  conventional  testing.  On  the  average,  the  number  of 
items  administered  with  adaptive  testing  was  half  as  many  as  was  required 
with  conventional  testing;  the  shortest  adaptive  test  battery  used  18%  of 
the  total  number  of  items  in  the  conventional  test,  while  the  longest  used 
18%.  The  adaptive  testing  strategy,  therefore,  provided  a  considerable  re¬ 
duction  in  test  length  and  virtually  no  loss  in  precision  of  measurement  when 
compared  with  the  conventional  administration  of  the  achievement  test  battery. 
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Research  Report  77-7 

An  Information  Comparison  of  Conventional  and  Adaptive  Tests 
in  the  Measurement  of  Classroom  Achievement 

Isaac  I.  Bejar,  David  J.  Weiss,  and  Kathleen  A.  Gialluca 

October  1977 

The  information  provided  by  typical  and  improved  conventional  classroom  paper - 
and-pencil  achievement  tests  is  compared  with  the  information  provided  by  an 
adaptive  test  covering  the  same  subject  matter.  Both  tests  were  administered 
to  over  700  students  in  a  general  biology  course.  Using  the  same  scoring 
method,  adaptive  testing  was  found  to  yield  substantially  more  precise  esti¬ 
mates  of  achievement  level  than  the  conventional  test  throughout  the  entire 
range  of  achievement,  while  at  the  same  time  reducing  the  length  of  the  test. 
The  comparison  of  the  improved  conventional  test  with  the  stradaptive  test 
also  indicated  that  the  scores  derived  from  the  adaptive  test  were  more  pre¬ 
cise,  even  in  the  range  of  achievement  where  the  improved  test  was  designed 
to  be  optimal.  An  analysis  of  the  effects  of  expanding  an  adaptive  test  item 
pool  indicates  that  even  when  slightly  more  discriminating  items  are  added  to 
the  pool,  improved  precision  of  measurement  can  result.  A  comparison  of  re¬ 
sponse  pattern  information  values  (observed  information)  with  test  information 
values  (theoretical  information)  shows  that  the  observed  information  consis¬ 
tently  underestimates  theoretical  information,  although  the  pattern  of  results 
from  the  two  procedures  is  quite  similar.  It  is  concluded  that  the  adap¬ 
tive  measurement  of  classroom  achievement  results  in  scores  that  are  less 
likely  to  be  confounded  by  errors  of  measurement  and,  therefore,  are  more 
likely  to  reflect  a  testee’s  true  level  of  achievement.  In  addition,  the  re¬ 
duction  in  number  of  test  items  administered  by  the  adaptive  measurement  of 
achievement  can  result  in  additional  time  spent  in  instruction. 


Research  Report  78-4 

A  Construct  Validation  of  Adaptive  Achievement  Testing 

Isaac  I.  Bejar  and  David  J.  Weiss 
November  1978 

The  construct  validities  of  conventional  classroom  paper-and-pencil  and  adap¬ 
tive  achievement  tests  were  compared  using  data  from  two  independent  groups  of 
269  and  230  college  students.  Two  adaptive  achievement  tests  were  computer  ad¬ 
ministered  to  each  group  using  the  stradaptive  testing  strategy;  each  group 
also  completed  two  conventional  classroom  paper-and-pencil  achievement  tests. 
All  achievement  tests  were  drawn  from  the  same  pool  of  achievement  test  items 
on  which  item  characteristic  curve  (ICC)  parameters  had  been  determined. 
Students  were  also  administered  two  stradaptive  vocabulary  tests.  All  tests 
were  scored  by  maximum  likelihood  estimation  using  the  three-parameter  logis¬ 
tic  model.  A  nomological  net  was  specified,  describing  the  relationships  of 
the  achievement  tests  to  the  achievement  constructs  and  their  relationships 
with  the  vocabulary  construct  and  the  vocabulary  tests.  The  parameters  of  the 
net  were  estimated  by  fitting  the  observed  intercorrelations  among  the  test 
scores  to  the  nomological  net,  using  the  methodology  of  linear  structural 
equations.  Maximum  likelihood  estimates  of  the  parameters  of  the  nomological 
net  indicated  essentially  equal  validities  for  the  conventional  and  adaptive 
tests  in  four  comparisons.  However,  the  validity  of  the  adaptive  tests  was 
effectively  higher  than  that  of  the  conventional  tests,  since  equal  validities 
were  achieved  with  from  25%  to  31%  fewer  items.  The  data  also  permitted  an 
analysis  of  the  effects  of  verbal  ability  on  achievement  test  performance, 
separately  for  the  conventional  and  adaptive  tests.  The  results  from  a  con¬ 
firmatory  maximum  likelihood  factor  analysis  showed  a  larger  influence  of 


verbal  ability  on  achievement  test  performance  at  the  first  administration  of 
the  adaptive  test.  This  result  was  attributed  to  a  necessity  to  learn  how  to 
use  the  computer  equipment  with  verbal  instructions,  which  may  have  further 
reduced  the  validity  of  the  adaptive  tests.  Combined  with  the  facts  that  the 
adaptive  tests  were  obtained  under  volunteer  conditions  while  the  conventional 
tests  were  obtained  under  "motivated"  grading  conditions,  the  results  of  this 
study  indicate  that  computer-administered  adaptive  tests  can  provide  more 
valid  measurement  of  achievement  than  conventional  classroom  paper -and -pencil 
tests. 


Research  Report  79-1 

Computer*  Programs  for  Scoring  Test  Data  with  Item  Characteristic  Curve  Models 

Isaac  I.  Bejar  and  David  J.  Weiss 
February  1979 

Three  computer  programs  are  described  for  scoring  test  response  data  using  item 
characteristic  curve  (ICC),  or  latent  trait,  models.  The  rationale  and  math¬ 
ematical  basis  of  both  maximum  likelihood  and  Bayesian  ICC  scoring  methods  are 
presented,  as  well  as  some  data  comparing  the  two  methods  of  scoring.  The 
three  computer  programs  are  designed  for  scoring  conventional  (linear)  test 
data  (LINDSCO)  in  dichotomous  response  format,  adaptive  test  dichotomous  data 
(ADADSCO) ,  and  conventional  (linear)  test  data  scored  by  polychotomous  ICC 
models  (LINPSCO) .  Options  available  in  these  three  general  purpose  programs 
are  described,  and  examples  of  the  input  and  output  are  given  for  each  program. 
Complete  FORTRAN  listings  of  the  three  programs  are  included. 


Research  Report  79-3 

Relationships  Among  Achievement  Level  Estimates  from  Three 
Item  Characteristic  Curve  Scoring  Methods 

G.  Gage  Kingsbury  and  David  J.  Weiss 
April  1979 

This  study  compared  achievement  level  estimates  from  three  item  characteristic 
curve  (ICC)  scoring  methods  using  the  one-,  two-,  and  three-parameter  ICC 
models.  The  three  scoring  methods  were  maximum  likelihood  normal,  maximum 
likelihood  logistic,  and  Owen’s  (1975)  Bayesian  scoring  method.  Data  included 
all  possible  response  patterns  from  a  hypothetical  five-item  test,  as  well  as 
response  patterns  from  live  administration  of  a  conventional  classroom  and  a 
computerized  adaptive  achievement  test.  For  the  conventional  and  adaptive 
test  data,  correlations  among  achievement  level  estimates  were  examined  as  a 
function  of  test  length.  Results  for  all  data  sets  showed  a  high  degree  of 
similarity  among  0  estimates  for  the  one-  and  two-parameter  data,  with  slight 
decreases  in  correlations  as  information  on  the  discrimination  parameter  was 
used  in  scoring.  When  the  third  ("guessing”)  parameter  was  used  in  scoring 
the  item  response  data,  correlations  among  0  estimates  were  reduced,  particu¬ 
larly  for  the  adaptive  test  data.  The  data  also  showed  an  increasing  tendency 
for  the  maximum  likelihood  methods  to  result  in  convergence  failures  as  the 
third  parameter  of  the  ICC  was  used  in  scoring.  In  general,  however,  the 
adaptive  test  data  were  less  likely  to  result  in  convergence  failures  than 
were  the  conventional  test  data.  The  data  also  illustrated  how  each  of  the 
three  scoring  methods  tend  to  utilize  ICC  parameter  information  in  arriving  at 
0  estimates  and  the  relationships  of  these  estimates  to  a  number  correct  scor¬ 
ing  philosophy.  Advantages  and  disadvantages  of  each  of  the  scoring  methods 
are  discussed.  It  is  suggested  that  future  research  examine  the  relative 
validities  of  scoring  methods  and  model  combinations. 
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Research  Report  79-4 

Sffect  of  Point-in-Time  in  Instruction  on  the  Measurement  of  Achievement 

G.  Gage  Kingsbury  and  David  J.  Weiss 
August  1979 

Item  characteristic  curve  (ICC)  theory  has  potential  for  solving  some  of  the 
problems  inherent  in  the  pretest-test  and  test-posttest  paradigms  for  measur¬ 
ing  change  in  achievement  levels.  However,  if  achievement  tests  given  at 
different  points  in  the  course  of  instruction  tap  different  achievement  di¬ 
mensions,  the  use  of  ICC  approaches  and/or  change  scores  from  these  tests  is 
not  desirable.  This  problem  is  investigated  in  two  studies  designed  to  de¬ 
termine  whether  or  not  achievement  tests  administered  at  different  times  during 
a  sequence  of  instruction  actually  measure  the  same  achievement  dimensions.  To 
investigate  possible  changes  in  dimensionality  between  different  points  in  in¬ 
struction,  aspects  of  the  dimensionality  of  achievement  test  data  were  examined 
prior  to  instruction,  at  the  peak  of  instruction,  and  up  to  a  month  following 
the  peak  of  instruction.  Data  used  were  conventional  and  adaptive  achievement 
test  data  administered  to  students  in  a  general  biology  course  at  the  Univer¬ 
sity  of  Minnesota.  Results  raised  questions  about  the  utility  of  the  pretest- 
test  paradigm  for  measuring  change  in  achievement  levels,  since  a  comparison 
of  ICC  parameter  estimates  indicated  that  a  change  in  the  dimensionality  of 
achievement  had  occurred  within  the  short  (4-week)  period  of  instruction.  This 
change  was  also  observed  using  a  factor  analytic  comparison.  Use  of  the  test- 
posttest  paradigm  to  measure  retention  was  supported,  since  a  regression  com¬ 
parison  of  students1  achievement  level  estimates  did  not  indicate  any  signifi¬ 
cant  change  in  the  achievement  metric  up  to  1  month  after  the  peak  of  ii.otruc- 
tion.  The  significance  of  this  result  for  the  use  of  adaptive  testing 
technology  in  measuring  achievement  is  described.  Implications  of  these 
studies  and  the  use  of  ICC  theory  in  the  measurement  of  achievement,  as  well  as 
some  potential  limitations  in  terms  of  generalizability  of  these  results,  are 
discussed . 


Research  Report  79-5 

An  Adaptive  Testing  Strategy  for  Mastery  Decisions 

G.  Gage  Kingsbury  and  David  J.  Weiss 
September  1979 

In  an  attempt  to  increase  the  efficiency  of  mastery  testing  while  maintaining 
a  high  level  of  confidence  for  each  mastery  decision,  the  theory  and  technolo¬ 
gy  of  item  characteristic  curve  (ICC)  response  theory  (Lord  &  Novick,  1968) 
and  adaptive  testing  were  applied  to  the  problem  of  judging  individuals1  com¬ 
petencies  against  a  prespecified  mastery  level  to  determine  whether  each  indi¬ 
vidual  is  a  "master"  or  a  "nonmaster"  of  a  specified  content  domain.  Items 
from  two  conventionally  administered  classroom  mastery  tests  administered  in  a 
military  training  environment  were  calibrated  using  the  unidimensional  three- 
parameter  logistic  ICC  model.  Then,  using  response  data  originally  obtained 
from  the  conventional  administration  of  the  tests,  a  computerized  adaptive 
mastery  testing  (AMT)  strategy  was  applied  in  a  real-data  simulation.  The  AMT 
procedure  used  ICC  theory  to  transform  the  arbitrary  "proportion  correct" 
mastery  level  used  in  traditional  mastery  testing  to  the  ICC  achievement  metric 
in  order  to  allow  the  adaptation  of  the  test  to  each  trainee’s  achievement 
level  estimate,  which  was  calculated  after  each  item  response.  Adaptive  test¬ 
ing  continued  until  the  95%  Bayesian  confidence  interval  around  the  trainee's 
achievement  level  estimate  failed  to  contain  the  prespecified  mastery  level. 

At  that  point  testing  was  terminated,  and  a  mastery  decision  was  made  for  the 
trainee.  Results  obtained  from  the  AMT  procedure  were  compared  to  results 


obtained  from  the  traditional  mastery  testing  paradigm  in  terms  of  the  reduc¬ 
tion  in  mean  test  length,  information  characteristics,  and  the  correspondence 
between  decisions  made  by  the  two  procedures  for  three  different  mastery  levels 
and  for  each  of  the  two  tests*  The  AMT  procedure  reduced  the  average  test 
length  30%  to  81%  over  all  circumstances  examined  (with  modal  test  length  re¬ 
ductions  of  up  to  92%)  while  reaching  the  same  decision  as  the  conventional 
procedure  for  96%  of  the  trainees.  Additional  advantages  and  possible  ap¬ 
plications  of  AMT  procedures  in  certain  classroom  situations  are  noted  and 
discussed,  and  further  research  questions  are  suggested. 


Research  Report  80-1 

Effects  of  Immediate  Knowledge  of  Results  on  Achievement 
Test  Performance  and  Test  dimensionality 

Kathleen  A.  Gialluca  and  David  J.  Weiss 
January  1980 

These  two  studies  investigated  the  effects  of  administering  immediate  knowledge 
of  results  (KR)  concerning  the  correctness  or  incorrectness  of  each  item  re¬ 
sponse  on  a  computerized  adaptive  test  of  Biology  achievement.  In  the  case  of 
incorrect  responses,  the  correct  answers  were  provided  to  the  student.  The 
results  of  these  studies  indicate  that  the  provision  of  informative  KR  did  not 
systematically  increase  total  test  scores,  as  would  be  expected  if  students 
were  using  information  from  previously  administered  items  to  help  them  answer 
subsequent  items.  Furthermore,  provision  of  informative  KR  did  not  alter  the 
dimensionality  of  the  achievement  tests  administered,  indicating  that  the 
latent  trait  model  assumption  of  local  independence  among  the  items  was  not 
affected  to  any  significant  degree. 
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